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background: Prognosis of localised gastrointestinal stromal tumour (GIST) is heterogeneous, notably for patients with AFIP 
intermediate or high risk of relapse, who are candidates to adjuvant imatinib. We hypothesised that gene expression profiles might 
improve the prognostication and help to refine the indications for imatinib. 

methods: We collected gene expression and histoclinical data of 146 pre-treatment localised GIST samples treated with surgery 
alone. We searched for a gene expression signature (GES) predictive for relapse-free survival (RFS) and compared its performances 
to that of three published prognostic proliferation-based GES (Genomic Grade Index (GGI), I6-Kinase, and CINSARC) and AFIP 
classification. We also analysed a data set from 28 patients with advanced GIST treated with neo-adjuvant imatinib. 
results: We identified a 275-gene GES (gene expression signature) predictive of RFS in a learning set and validated its robustness in 
an independent set. However, the GGI outperformed its prognostic performances, and those of the two other signatures and the 
AFIP intermediate-risk classification in two independent tests sets in uni- and multivariate analyses. Importantly, GGI could split the 
AFIP intermediate/high-risk samples into two groups with different RFS. Genomic Grade Index 'high-risk' tumours were more 
proliferative and genetically unstable than 'low-risk' tumours, and more sensitive to imatinib. 
conclusion: GGI refines the prediction of RFS in localised GIST and might help tailor adjuvant imatinib. 
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Gastrointestinal stromal tumours (GISTs) are the most frequent 
mesenchymal digestive tumours (Kitamura et al, 2003; Corless 
et al, 2011). Characterised by activating KIT or PDGFRA 
mutations, which rend them sensitive to tyrosine kinase inhibitors 
(imatinib and sunitinib), they represent a model for molecular- 
based diagnosis (West et al, 2004) and treatment (Antonescu, 2011; 
Joensuu and Dematteo, 2012). The mainstay of treatment for 
localised tumour is complete surgical resection (Casali and Blay, 
2010). However, the postoperative clinical outcome is hetero- 
geneous, and 20-40% of patients will relapse. KIT and PDGFRA 
mutations are early molecular events in GIST development, but 
those that accumulate during the disease progression are poorly 
elucidated. The current prognostic criteria - anatomical site, 
pathological tumour size, mitotic count and tumour rupture 
(Dematteo et al, 2008; Casali and Blay, 2010) - are combined in 
several risk classifications: NIH 2002 (Fletcher et al, 2002), AFIP 
2007 (Miettinen and Lasota, 2006), and Joensuu's classification 
(Joensuu, 2008). Based on recent studies (Dematteo et al, 2009; 
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Joensuu et al, 2011), adjuvant imatinib is recommended for 
patients with 'substantial risk of relapse' according to ESMO 
guidelines and with 'intermediate to high risk' according to AFIP. 
In Europe, the AFIP classification is widely used. However, the 
2-year relapse-free survival (RFS) without adjuvant imatinib is 
close to 75% in the AFIP intermediate-risk patients and 50% in the 
high-risk patients (Dematteo et al, 2009), meaning that many of 
them are in fact cured by surgery alone. 

Today, the imperfection of these risk stratification systems 
imposes to refine them to better tailor adjuvant treatment (Patel, 
2011). The knowledge of molecular alterations may help achieve 
this. Prognostic value has been associated with the type of KIT/ 
PDGFRA mutation (Heinrich et al, 2008a, b) and with a few 
markers (Schneider-Stock et al, 2005; Sabah et al, 2006; Schmieder 
et al, 2008). But the results are still in some cases contradictory, 
and these alterations, not currently considered as established 
independent prognostic factors, are not included in risk classifica- 
tions. High-throughput gene expression profiling may improve the 
prognostic classification of cancers, as observed in breast cancer 
(Bertucci et al, 2006). Several DNA microarray-based studies of 
GIST samples (Allander et al, 2001; Antonescu et al, 2004; Koon 
et al, 2004; Subramanian et al, 2004; Kang et al, 2005; Yamaguchi 
et al, 2008; Ostrowski et al, 2009; Rink et al, 2009; Astolfi et al, 
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2010; Yang et al, 2010; Arne et al, 2011; Ylipaa et at, 2011; Lagarde 
et al, 2012) have highlighted the transcriptional heterogeneity of 
the disease, notably related to the location, the mutational status, 
and the risk group. But these series are relatively small (60 cases 
for the largest one), only 4 directly addressed the prognostic issue 
(Koon et al, 2004; Yamaguchi et al, 2008; Arne et al, 2011; Lagarde 
et al, 2012), and none identified a multigene predictor that was 
tested in an independent validation set. 

Here, we analysed expression data of 146 imatinib-untreated 
localised GIST profiled using whole-genome DNA microarrays, 
and searched for a GES predictive for RFS after surgery. 



MATERIALS AND METHODS 
Tumour samples 

We collected from our databases (Kang et al, 2005; Ostrowski et al, 
2009; Astolfi et al, 2010) and public databases (Yamaguchi et al, 
2008; Rink et al, 2009; Lagarde et al, 2012) clinical and whole- 
genome expression data of pre-treatment samples from 146 
non-metastatic GIST patients (Kang et al, 2005; Yamaguchi et al, 
2008; Ostrowski et al, 2009; Astolfi et al, 2010; Lagarde et al, 2012) 
treated with complete primary surgery without adjuvant imatinib 
(Table 1), and from 28 patients with advanced GIST treated with 
neo-adjuvant imatinib in the Radiation Therapy Oncology Group 
Study 0132 (RTOG0132; Rink et al, 2009). Data sets are described 



Table 



Histoclinical characteristics of the 146 GIST samples 



Characteristics (N) 

Sex (146) 
F 
M 

Age (86) 
sc60 
>60 

Site (146) 
G 
SI 

Other 

Size (86) 
<5 
sc 10 
> 10 

AF/P (146) 
Low 

Intermediate 
High 

Mutation (145) 
K/T_ex9 
K/7_ex I 
KIT_ex I 3 
K/7_ex 1 7 
PDGFRA_ex 1 2 
PDGFRA_ex 1 4 
PDGFRA_ex I 8 
Wild type 

Relapse (146) 
0 
I 

5 Year RFS (86) 

Median follow-up, months (range) 



N (%) 



56 (38) 
90 (62) 



38 (44) 
48 (56) 



I 1 6 (79) 
21 (14) 
9(7) 



36 (42) 
32 (37) 
18(21) 



77 (53) 
30 (21) 
39 (27) 



9(6) 

93 (64) 

I (I) 

I (I) 
5(3) 

I (I) 
18(12) 
17(12) 



I 1 4 (78) 
32 (22) 
79% (0.69-0.5 
47 (2- 1 65) 



Abbreviations: ex = exon; F = female; G = gastric; GIST = gastrointestinal stromal 
tumour; M — male; RFS — relapse-free survival; SI = small intestine. 



in Supplementary Table SI. Samples were profiled using whole- 
genome DNA microarrays: Affymetrix U133 Plus 2.0 (Yamaguchi 
et al, 2008; Ostrowski et al, 2009; Astolfi et al, 2010), Agilent 44K 
(Rink et al, 2009; Lagarde et al, 2012), and home-made 
microarrays spotted at the Genome Institute of Singapore (Kang 
et al, 2005). The study was approved by our Institutional Board. 

Gene expression data analysis 

Before analysis, we first re-annotated all hybridisation probes 
present on the three types of microarrays (Affymetrix, Agilent, and 
Singapore's microarrays). Affymetrix gene chips annotations were 
updated using NetAffx Annotation files (www.affymetrix.com; 
release from 1 December 2008). Agilent gene chips annotations 
were retrieved and updated using both SOURCE (http://smd. 
stanford.edu/cgi-bin/source/sourceSearch) and EntrezGene (Homo 
sapiens gene information database, release from 09 December 
2008, ftp://ftp.ncbi.nlm.nih.gov/gene/). Regarding the Singapore 
chips, the 18 664 probe sets representing 18 664 unique (LEADSt) 
genes were updated using both SOURCE and EntrezGene. 
All probes were thus mapped based on their EntrezGenelD. When 
multiple probes were mapped to the same GenelD, the one with the 
highest variance in a particular data set was selected to represent 
the GenelD. 

Data sets were then processed as follows. For the Agilent-based 
sets and the Singapore set, we used the available processed data. 
Regarding the Affymetrix-based data sets, we applied Robust 
Multichip Average to the raw data before using distance weighted 
discrimination to make each set comparable to each other using 
the Yamaguchi's set as reference. Robust Multichip Average was 
done in R using Bioconductor (Seattle, WA, USA) and associated 
packages. 

Unsupervised hierarchical clustering was applied to Affymetrix 
data only. Before clustering, a filtering retained the 12 619 genes 
with the most variable expression across all samples (s.d. >0.25). 
Clustering was done with the Cluster program (Eisen et al, 1998) 
using Pearson's uncentered correlation and centroid linkage 
clustering. Results were displayed using TreeView program 
(Eisen et al, 1998). Biologically relevant gene clusters were 
identified and summarised as metagenes for prognostic analysis. 
Metagene expression value was the mean of the normalised 
expression values of all genes in the respective gene subset. Once 
computed, the metagenes were tested as continuous value for their 
prognostic incidence with respect to RFS using Cox regression 
univariate and multivariate analyses. 

Supervised analysis searched for a GES associated with RFS. The 
data set was split into a learning set (64 Affymetrix samples; 
Yamaguchi et al, 2008; Ostrowski et al, 2009; Astolfi et al, 2010) 
and a validation set (82 independent, non-Affymetrix samples; 
Kang et al, 2005; Rink et al, 2009; Lagarde et al, 2012). Cox 
regression analysis (Wald test) was applied to the 17 870 genes 
using a FDR threshold of 1% after adjustment for testing of 
multiple hypotheses. The robustness of the signature was then 
tested in the independent validation set: by classifying samples 
according to the Pearson correlation coefficient of their expression 
profile with the mean profile of the samples with relapse from the 
learning set. The resulting gene list was interrogated using the 
Ingenuity Pathway Analysis (IPA) software (Redwood City, CA, 
USA) to assess significant representation of biological pathways 
(thresholds of significance for P-values were 0.001). 

We also tested the prognostic value of three proliferation-based 
GES previously reported in breast cancer and soft tissue sarcoma: 
the Genomic Grade Index (GGI), which includes 128 probe sets 
(108 genes; Sotiriou et al, 2006), the 16-kinase signature 
(16-Kinase), which includes 16 kinase genes (Finetti et al, 2008), 
and the CINSARC signature, which includes 67 genes (Chibon 
et al, 2010). Each signature was applied in each of the five data sets 
separately to guarantee the largest number of common genes as 
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possible. We then strictly applied the same method (score or 
correlation, cutoffs, and scaling methods) as that reported in the 
original publications to classify each sample into the prognostic 
group 'low-risk' or 'high-risk'. More details are available in 
Supplementary Table S2. Another supervised analysis comparing 
the gene expression profiles of the two GGI-based sample groups 
was done in the Affymetrix data set using significance analysis of 
microarrays with a FDR threshold of 1%. 



Statistical analyses 

Correlations between sample groups and histoclinical factors were 
calculated with the Fisher's exact test for qualitative variables and 
the Mann-Whitney test for continuous variables. The primary 
endpoint was RFS calculated from the date of diagnosis until date 
of first relapse whatever its location using the Kaplan-Meier 
method. In one data set (Lagarde et al, 2012), the presence or 
absence of relapse was documented without information about the 
delay (60 samples). In the whole data set (N= 146), 114 patients 
did not experience any relapse, whereas 32 relapsed. In the 86 cases 
with available time information, the median follow-up - measured 
from the date of diagnosis to the date of last news - was 47 months 
(range, 2-165) for the 71 relapse-free patients; the median time to 
relapse was 19 months (range, 5-74) for the 15 patients who 
relapsed, and the 5-year RFS was 79% (CI 95% (69-90)). Survival 
was compared between groups with the log-rank test. Univariate 
and multivariate analyses were done using either Cox regression 
analysis when the time information was available, or a logistic 
regression analysis using the glm function (R's statistical package) 
when it was not available (significance estimated by specifying a 
binomial family for model with a logit link). The variables tested in 
univariate analyses included the sample classification based on 
each GES 'high-risk' vs 'low-risk' and the AFIP classification (high 
vs intermediate vs low-risk). Multivariate analysis incorporated all 
variables with a P-value inferior to 1% in univariate analysis. The 
Cox multivariate analysis was done with the Cox proportional 
hazard model or Cox regression with the Firth's correction 
(R software, Bioconductor) depending on occurrence or not of 
relapse in the reference group. All statistical tests were two-sided at 
the 5% level of significance. Statistical analysis was done using the 
survival package (version 2.30) in the R software (version 2.9.1). 
The paper is written in accordance with reporting recommenda- 
tions for tumour marker prognostic studies (REMARK) criteria 
(McShane et al, 2005). 



RESULTS 

Whole-genome gene expression profiles 

Hierarchical clustering was applied to 64 samples and 12 619 most 
varying genes in the Affymetrix data set (Supplementary Figure 
SI). As reflected by the dendrogram, the samples displayed 
heterogeneous expression profiles (Supplementary Figures 
S1A and B). They were sorted into two main 32 sample clusters 
(I and II), which correlated with the anatomical site (69% gastric 
in cluster I vs 100% in cluster II, P=8.5E-04) and the 
mutation status (84% KZT-mutated in cluster II vs 48% in cluster 
I, and 3% PDGFPA-mutated in cluster II vs 29% in cluster I, 
P = 3.2E — 04). No significant correlation existed with the AFIP 
classification or clinical outcome, even if cluster II included more 
AFIP high-risk patients (34% vs 15% in cluster I; P=0.17), 
and more patients with relapse during follow-up (22% vs 9%; 
P = 0.3; HR = 2.67). 

Several gene clusters were evidenced. Some of them defined 
expression signatures related to specific cell types, biological 
functions, or chromosomal locations (Supplementary Figures 
S1A-C). A proliferation cluster included two classical proliferation 



markers (MKI67 and PCNA) and many genes involved in the cell 
cycle and mitosis such as AURKA/B; cyclin genes, TOP2A, TTK, or 
BUB1. Visually, this cluster was more expressed in AFIP high-risk 
samples in agreement with a likely higher mitotic index. 
An immune cluster reflected variation in specific immune cell 
types within the tumours. It included genes encoding markers of B 
cells (immunoglobulins and HLA molecules), T cells (CD3, ZAP70, 
and CD4), NK cells (NKG7 and CD244), or macrophages (CD68, 
CD14, and CSF1R). Visually, this cluster was more expressed in 
PDGFRA -mutated samples in agreement with their classically high 
lymphocyte infiltrate (Subramanian et al, 2004). An early response 
gene cluster, including for example FOS, ATF3, DUSP2, and 
NR4A2, appeared more expressed in PDGFRA -mutated samples as 
expected (Subramanian et al, 2004). A cluster, enriched in genes 
involved in angiogenesis (VEGFA, TEK, FLT1/3, CDH5, and 
PECAM1 ) was overall overexpressed in samples without KIT exon 
11 mutation as expected (Imamura et al, 2007). Finally, a cluster of 
co-expressed genes represented a presumptive lost chromosomal 
region (9p loss cluster, of which 15% of genes are located on the 9p 
chromosomal arm); its expression was visually negatively 
correlated with that of the 'proliferation cluster' and associated 
with worse RFS as previously reported (Schneider-Stock et al, 
2005; Haller et al, 2008a). We tested the prognostic value of these 
five clusters computed as metagenes. The proliferation, 9p loss, 
and early response metagenes were associated with poor RFS in 
Cox univariate analysis (P< 0.05, Wald test), whereas the two other 
ones (immune and vascular) were not. In multivariate analysis 
including the three significant metagenes, only the proliferation 
metagene remained significant. 

We also confirmed the overexpression of some individual 
control genes in relation with the mutation status (Subramanian 
et al, 2004; Chi et al, 2010; Arne et al, 2011): KIT, CD34, ETV1, 
PROM1 (CD133), or RPS6KA1 in K/T-mutated samples, EPHA4 or 
DSG2 for KIT exon 9 mutations, and PDGFRA, IGFBP5, IGF1, 
SPON1, or PRKCA for PDGFRA mutations. 



GES for RFS 

We searched for a GES associated with RFS. The data set was split 
into a learning set and a validation set. To avoid additional 
normalisation, we used the Affymetrix set as learning set (N=64) 
and the two other sets as independent validation set (N=82). 
In the learning set (10 relapses), Cox analysis identified 275 
discriminating genes (FDR 1%; Supplementary Table S3), includ- 
ing 129 genes overexpressed and 146 underexpressed in samples 
with event. As expected, the resulting classification of learning set 
samples strongly correlated with clinical outcome (P=8.5E — 04, 
Supplementary Figure S2A). More importantly, in the validation 
set (Supplementary Figure S2B), 39 out of 60 (65%) patients who 
did not relapse were classified by the signature as 'low-risk', and 21 
out of 22 (95%) patients who relapsed were accurately predicted as 
'high-risk'. The rate of accurate classification was 73%, suggesting 
the robustness of the signature (P=5.4E — 07). When applied to 
the whole series of samples (learning and validation sets pooled), 
71 out of 114 (63%) patients who did not relapse were classified as 
'low- risk', and 31 out of 32 (97%) patients who relapsed were 
accurately predicted as 'high-risk' (P= 3.7E — 10). Considering the 
86 patients with available RFS time, the 5-year RFS was 97% (CI 
95% (92-100)) in the 'low-risk' group (N = 39) and 64% (CI 95% 
(50-84)) in the 'high-risk' group (N=47) as defined using the 
signature (P = 0.0016, Figure 1A). 

The biological significance of the signature was interrogated 
using the IPA gene ontology software (Supplementary Table S4A). 
No significantly enriched canonical pathway was identified for the 
146 underexpressed genes. By contrast, analysis found a significant 
overrepresentation of cell cycle-associated pathways within the 
overexpressed genes. 
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Figure I Time-dependent correlation between the classifications based on four prognostic GES and RFS in 86 GIST samples. Kaplan-Meier RFS curves of 
the two GIST groups (low-risk and high-risk) defined by the: (A) 275-gene signature, (B) GGI, (C) I 6-Kinase signature, and (D) CINSARC. 



Table 2 Correlation between the classifications based on three prognostic GES and RFS in GIST samples in the two tests sets, separately and pooled 



Test set no. I 
(N = 86; time to RFS available) 



Test set no. 2 
(N 60; time to RFS not available) 



Pooled test sets 
(N 146; all samples) 



Prognostic 



Relapse 



P-value a 



Relapse 



P-value a 



Relapse 



P-value a 



GES 


No 


Yes 


Total 


OR (95% CI) 


No 


Yes 


Total 


OR (95% CI) 


No 


Yes 


Total 


OR (95% CI) 


GGI 


























Low-risk 


62 


4 


66 


5.83E-06 


42 


4 


46 


5.94E-09 


104 


8 


1 12 


3.70E - 1 3 


High-risk 


9 


1 1 


20 


17.9 (4.2-94.8) 




3 


H 


14.2 (12.5-5653.5) 


10 


24 


34 


29.8 (10.0-100.2) 


/ 6-Kinase 


























Low-risk 


48 


3 


51 


.04E - 03 


33 




34 


4.75E-07 


81 


4 


85 


3.77E - 09 


High-risk 


23 


12 


35 


8.1 (1.9-49.3) 


0 


6 


26 


48.75 (6.2-2246.3) 


33 


28 


6 


6.8 (5.3-71.1) 


GINSARG 


























Low-risk 


45 


3 


48 


3.34E-03 


32 


0 


32 


5.54E-08 


77 


3 


80 


2.30E-09 


High-risk 


26 


12 


38 


6.8 ( 1 .6-40.8) 




7 


28 


INF (9.4-lnf) 


37 


29 


66 


19.7 (5.6-107.7) 



Abbreviations: CI = confidence interval; CINSARC = complexity index in sarcomas; GES = gene expression signature; GGI = Genomic Grade Index; GIST = gastrointestinal 
stromal tumour OR = odds ratio; RFS = relapse-free survival. a Fisher's exact test P-value with OR and CI associated. 



Prognostic value of three proliferation-based GES 

Given the importance of proliferation genes in this signature, we 
tested whether three published proliferation-associated GES with 
prognostic value in breast cancer (GGI, 16-Kinase signature) and 
sarcomas (CINSARC) had prognostic value in GISTs. Analysis was 
done in two separated test sets (no. 1 and no. 2). We first analysed 
the 86 sample series (test set no. 1) where the time to RFS was 
available. Each GES divided samples into two groups 'low-risk' and 



'high-risk' with different 5-year RFS (Figures 1B-D), respectively, 
91% (CI 95% (82-100); N= 66) and 46% (CI 95% (28-77); N= 20) 
with the GGI (P= 1.4E- 06), 93% (CI 95% (85-100); N= 51) and 
59% (CI 95% (42-84); N=35) with the 16-Kinase signature 
(P=6.44E-04), and 92% (CI 95% (84-100); N = 48) and 67% 
(CI 95% (53-86); JV= 38) with CINSARC (P= 0.01). Table 2 shows 
the corresponding contingency tables. Although each signature 
gave significant results, the largest difference in RFS was observed 
with the GGI. 
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Table 3 Univariate and multivariate analyses for RFS 







Univariate: 




Multivariate: Cox regression using 






Cox regression 






Firth's bias reduction 


Test set no. 1 (N = 86; time to RFS available) 


N 


HR (95% CI) 


P-value 


N 


HR (95% CI) 


P-value 


GGI 














High vs low 


86 


9.93 (3.13-32.3) 


2. 1 2E - 05 


86 


3.23 (I.I 2- 10.83) 


2.9 1 E - 02 


AFIP 














Intermediate vs low 


86 


3.84E08 (0-lnf) 


3.38E-08 


DO 


1 1.2 (0.87-I.6E03) 


0.06 


— inn \ /<" r\\Af 

i i iyi l vi> iuvv 


86 


2. 1 7E09 (0-lnf) 




86 


44.1 (5.23-5.8E03) 


3.01 E- 05 


275-gene 














Poor vs good 


86 


12.64 (1.66-96.17) 


I.43E-02 












Univariate: logistic regression 




Multivariate: logistic 


regression 


Test set no. 2 (N = 60; time to RFS not available) 


N 


Coefficient 


P-value 


N 


Coefficient 


P-value 


GGI 














High vs low 


60 


0.84 


I.I9E- 13 


60 


0.60 


3.I6E-08 


AFIP 














Intermediate vs low 


uU 


0.29 


9.I7E-03 


60 


0.1 


0.19 


High vs low 


60 


0.78 


8.I3E- 1 1 


60 


0.41 


5.83E-05 


275-gene 














Poor vs good 


60 


0.63 


1 .22E - 09 


60 


0.27 


7. 1 4E - 04 






Univariate: logistic regression 




Multivariate: logistic regression 


Pooled test sets (N 146; all samples) 


N 


Coefficient 


P-value 


N 


Coefficient 


P-value 


GGI 














High vs low 


146 


0.63 


<2.00E- 16 


146 


0.35 


I.25E-07 


AFIP 














Intermediate vs low 


146 


0.20 


<2.00E- 16 


Mb 


0.12 


0.05 


High vs low 


146 


0.67 


8.I3E - 1 1 


146 


0.42 


I.75E- 10 


275-gene 














Poor vs good 


146 


0.4 1 


3.62E- 10 


Mb 


0.15 


2.88E-03 



Abbreviations: CI = confidence interval; GGI = Genomic Grade Index; HR= hazard ratio; RFS = relapse-free survival. 



For further validation in an independent set, we repeated the 
analysis in the 60 Lagarde's samples. In this series, the time to 
relapse was not available (test set no. 2). The correlation between 
the resulting classification and the survival was thus analysed 
using contingency tables (Table 2). Correlation was significant for 
each signature, but again stronger for the GGI. When we pooled 
the two test sets (N= 146), the rate of accurate classifications 
was 88% for GGI, 75% for the 16-Kinase signature, and 73% 
for CINSARC (Table 2). We thus focused on the GGI in the next 
sections. 

Univariate and multivariate analyses for RFS 

We compared the prognostic performances of the classifications 
based on the 275-gene GES, GGI, and AFIP (Table 3). Analysis was 
done in the two test sets, separately then pooled. 

In the test set no. 1 (86 samples with available time to RFS) Cox 
univariate analysis confirmed the prognostic value of the three 
classifiers. In multivariate analysis, the GGI and the AFIP high-risk 
classification remained significant, suggesting their independent 
prognostic value, but the AFIP intermediate-risk classification was 
not. Genomic Grade Index sorted the AFIP intermediate/high-risk 
samples into two groups 'low-risk' and 'high-risk' with respective 
5-year RFS of 73% (CI 95% (52-100); N=22) and 35% (CI 95% 
(17-70); N= 16; P = 8.5E-03; Figure 2). 

As independent validation, we repeated the analyses in the test 
set no. 2 (Lagarde's 60 sample series) using logistic regression. 
Results of univariate analysis were similar. In multivariate analysis, 
the GGI signature remained significant, whereas the AFIP 



GGI 




r 

40 60 80 100 
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Figure 2 Correlation between the GGI-based classification and RFS in 
the AFIP intermediate/high-risk GIST samples. Kaplan-Meier RFS curves in 
the patients with available RFS time (N = 38). 



intermediate-risk classification was not. Genomic Grade Index 
sorted the AFIP intermediate/high-risk samples into two groups 
with different relapse rate: 4 out of 17 patients (24%) in the 
'low-risk' group and 13 out of 14 (93%) in the 'high-risk' 
(OR = 35.7; P=1.8E-04). When we pooled the two test sets 
(N= 146), similar results were observed (Table 3). 
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Comparison of the two GGI-based GIST groups 

The GGI was thus an independent predictor for RFS in two 
independent GIST tests sets. To explore the differences between 
the so-defined GGI 'low-risk' and 'high-risk' groups, we compared 
their histoclinical features (N= 146; Supplementary Table S5). 
Significant differences existed for three parameters. Genomic 
Grade Index 'high-risk' samples were more frequently AFIP high- 
risk (P=1.0E-05) and superior to 10 cm (P= 1.22E- 04), and 
displayed more relapses (P = 3.6E— 13: OR = 29.84) despite a 
similar follow-up. The AFIP intermediate/high-risk samples were 
sorted by GGI into two groups with different rate of relapses: 8 out 
of 39 patients (21%) in the 'low-risk' group and 24 out of 30 (80%) 
in the 'high-risk' group (OR= 14.7; P=8.8E-07). 

We then compared the whole-genome expression profiles of the 
two groups in the Affymetrix 64 sample series. Significance 
analysis of microarrays analysis identified 254 discriminating 
genes (FDR 1%), including 192 genes overexpressed and 62 
underexpressed in the 'high-risk' samples (Supplementary Table 
S6). Ontology analysis found an overrepresentation of canonical 
pathways within the overexpressed genes only (Supplementary 
Table S4B). Most of them were associated with cell cycle control 
and genome stability: examples include 'mitotic roles of polo-like 
kinase', 'role of chk proteins in cell cycle checkpoint control', 
'cell cycle: G2/M DNA damage checkpoint regulation', or 'cell cycle 
control of chromosomal replication'. Of note, 10 out of the 
16 significant pathways (62.5%) were common to those identified 
by IPA analysis of the genes overexpressed in the 275-gene GES. 
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Figure 3 Correlation between the GGI-based classification and 
the response to neo-adjuvant imatinib. Correlation between the two 
GGI-based groups (high-risk and low-risk; N = 29) and the response to 
imatinib assessed as a continuous variable (box plot). The horizontal 
dashed line indicates the cutoff of tumour shrinkage that defines the 
responder status. The figures within the box plot indicate the number of 
patients in each of the four categories. 



GGI-based classification and response to imatinib 

A major and complementary issue is to determine whether the GGI 
'high-risk' patients are more sensitive to imatinib. We analysed a 
set of 28 pre-treatment samples from patients with advanced 
primary and recurrent operable GIST treated in a phase II trial 
(RTOG0132) with 8-12 weeks neo-adjuvant imatinib and for whom 
the degree of clinical response (RECIST by comparing tumour 
measurements on computed tomography scans before and 
after imatinib) was available (Rink et al, 2009). The percentage 
of tumour shrinkage ranged from —76 to +21%. 

We reproduced the two GGI-based groups. As expected given 
the advanced stage of disease, more patients were classified 
as 'high-risk' (N=24) than 'low-risk' (N=4). Despite the low 
number of samples, the correlation between those groups and the 
degree of response (continuous variable) was significant with 
greater tumour shrinkage in the 'high-risk' group than the 'low- 
risk' group (mean — 29% vs — 5%, P = 0.04; Figure 3). By applying 
the response definition used in the original paper (binary 
variable), the correlation was high, even if not significant owing 
to the number of cases: 17 out of 24 'high-risk' samples (71%) were 
classified as rapid responders, whereas 3 out of 4 'low-risk' 
samples (75%) as non-responders (P = 0.116), with a OR for 
response of 6.7 in the 'high-risk' group as compared with the 'low- 
risk' group. These results suggested that GGI 'high-risk' patients 
are more imatinib sensitive than are 'low-risk' patients. 



DISCUSSION 

The risk of postoperative relapse in GIST patients dictates the 
delivery or not of adjuvant imatinib and the follow-up. Risk 
stratification systems based on histoclinical features exist, but are 
not perfect to solve the actual prognostic heterogeneity of disease. 
In Europe, the current gold standard is the AFIP classification and 
all intermediate- or high-risk patients are candidate to imatinib 
although many of them are in fact cured by surgery alone and do 
not need any postoperative therapy. Better identifying them would 
avoid not only the imatinib-related toxicity, but also the financial 
cost related to at least 1-3 years of treatment. Efforts are ongoing 



to improve these stratification systems (Gold et al, 2009; Patel, 
2011; Rossi et al, 2011; Joensuu et al, 2012). Although GIST is the 
most frequent digestive sarcoma, it has been only recently defined 
as an entity and the research on DNA microarrays and GIST 
prognosis has suffered from a lack of large clinically documented 
series (60 cases for the largest one), impeding the use of 
independent learning and validation sets. We analysed gene 
expression and histoclinical data of 146 localised GIST operated 
and untreated with adjuvant imatinib to assess the prognostic 
performance of gene expression profiles. To our knowledge, this is 
the largest series reported to date, and the first one that includes 
independent validation. 

Unsupervised analysis confirmed the transcriptional hetero- 
geneity of disease and expression differences associated with the 
mutation status. Supervised analysis searched for a GES associated 
with RFS. As recommended, our data set was split into two sets, 
learning and validation sets. We identified a 275-gene GES 
associated with RFS in a learning set and validated its robustness 
in an independent validation set. As this GES was mainly driven by 
proliferation genes, we hypothesised that three robust prognostic 
signatures associated with cell cycle control might be predictive for 
RFS in GIST. These signatures had been identified using 
hypothesis-driven approaches in breast cancer (by comparing 
histological grade III vs I for GGI, and basal vs luminal subtype for 
the 16-Kinase) and in sarcoma (CINSARC by comparing samples 
with high vs low number of genomic alterations and high vs low 
histological grade). Of note, none of the GIST samples had been 
used to generate these signatures. However, we divided our data 
set in two independent test sets, and interestingly, each signature 
had a prognostic value in each set. As GGI was the most significant 
one, we compared its prognostic performances with those of the 
275-gene GES and the AFIP classification. We found that GGI and 
the AFIP high-risk classification were independent prognosticators 
in both test sets. Genomic Grade Index provided additional 
information to AFIP by discriminating within the intermediate/ 
high-risk AFIP patients those with good prognosis GGI 'low-risk', 
who are not likely to need adjuvant imatinib, from those with 
poor prognosis GGI 'high-risk', who likely need imatinib. 
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Interestingly, we showed in a series of patients treated with 
neo-adjuvant imatinib for primary GIST (Rink et al, 2009) that 
GGI 'high-risk' patients were more frequently rapid responders to 
imatinib than 'low-risk' patients. If confirmed in a larger series, 
this correlation would further increase the clinical interest of 
GGI in GIST. 

Ontologies associated with the genes discriminating the two 
GGI-based groups showed that the poor prognosis of 'high-risk' 
group was mainly related to alterations in cell cycle control and 
maintenance of genome stability, these tumours being more 
proliferative and more unstable at the chromosomal level. This 
was confirmed using GSEA and a GES of genome instability 
(Carter et al, 2006; normalised enrichment score =1.86, 
P<2.12E— 16; data not shown). This higher instability likely 
confers to 'high-risk' tumours the ability to accumulate molecular 
alterations allowing them to metastasise. The prognostic impor- 
tance of proliferation in GIST is well known. The mitotic index is a 
component of current risk stratifications systems, but is liable to 
interobserver and technical variability (Patel, 2011). The prog- 
nostic value of expression of cell cycle-regulatory genes has been 
reported (Haller et al, 2005; Schneider-Stock et al, 2005; Sabah 
et al, 2006; Schmieder et al, 2008; Haller et al, 2008a, b; Romeo 
et al, 2009; Dorn et al, 2010; Fujita et al, 2012; Okamoto et al, 
2012). To date, four studies have analysed the prognostic value of 
gene expression profiles (Koon et al, 2004; Yamaguchi et al, 2008; 
Arne et al, 2011; Lagarde et al, 2012), but the largest one included 
60 cases only. Here, using a stringent significance threshold and in 
our 146 sample series, CCNB1 and CENPF (Koon et al, 2004) were 
included in our 275-gene list, but CD133 (Arne et al, 2011) and 
CD26 (Yamaguchi et al, 2008) were not; however, CD26 expression 
was associated with RFS (P = 0.0085) using less stringent criteria. 
Recently, Lagarde et al (2012) showed the prognostic value of 
CINSARC and AURKA expression, and developed a Genomic 
Index defined upon array-CGH data as a score of genomic 
instability associated with metastasis-free survival. Unfortunately, 
this prognosticator was not validated by the authors in an 
independent sample set. Here, we confirmed the prognostic value 
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of CINSARC, but showed that GGI was a stronger prognosticator 
in two independent test sets. 

Several genes included in the GGI and/or overexpressed in the 
GGI 'high-risk' samples encode potential therapeutic targets 
involved in cell cycle regulation that could be, if functionally 
validated, targeted by new drugs in the adjuvant setting, alone or 
associated with imatinib: kinases (A URKA/B, BUB1, CDC2, CDK4, 
CHEK1, NEK2, and PLK1/4) and phosphatase (CDC25). Corre- 
sponding inhibitors have entered cancer clinical trials with 
promising results. 

In conclusion, we show that a GGI-based classification of 
operated GIST outperforms the prognostic performances of three 
other GES and the AFIP intermediate-risk classification. The 
strength of our results lies in the size of our series (the largest one 
reported so far), the biological relevance of GGI, its independent 
prognostic value in two independent test sets through three 
different technological platforms. Of note, GGI sorts the AFIP 
intermediate/high-risk samples - current candidate to adjuvant 
imatinib - into two groups with different RFS. 'High-risk' tumours 
according to GGI are more proliferative and genetically unstable 
than 'low-risk' tumours, and more rapidly sensitive to imatinib. By 
refining the prediction of RFS, GGI might improve our ability to 
better tailor adjuvant imatinib. Further clinical validation of GGI is 
warranted in larger retrospective, then prospective series, as well 
as the functional validation of relevant genes that could provide 
new therapeutical targets. 
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